Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

feature(Types): added Experiment type encompassing Pipelines, metrics, project_name, etc. #114

Merged
merged 5 commits into from
Jul 31, 2022

Conversation

almostintuitive
Copy link
Contributor

@almostintuitive almostintuitive commented Jul 30, 2022

Closes #115
Closes #109

@almostintuitive almostintuitive changed the title feature(Types): added Experiment type encompassing Pipelines, metrics… feature(Types): added Experiment type encompassing Pipelines, metrics, project_name, etc. Jul 30, 2022
save_remote: Optional[
bool
] = None, # If set True all models will try uploading (if configured), if set False it overwrites uploading of any models (even if configured)
remote_logging: Optional[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like remote_logging was not used anywhere! did we have it wired up on latest main?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in one place: in run.py!

        logger_plugins = (
            [
                WandbPlugin(
                    WandbConfig(
                        project_id=project_id,
                        run_name=config.run_name + "-" + pipeline.id,
                        train=True,
                    ),
                    dict(
                        run_config=config.get_configs(),
                        preprocess_config=preprocess_config.get_configs(),
                        pipeline_configs=pipeline.get_configs(),
                    ),
                )
            ]
            if config.remote_logging
            else []
        )```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, my mistake! I meant save_remote, not remote_logging!

self.plugins = obligatory_plugins + plugins

self.pipeline = overwrite_model_configs(self.config, self.pipeline)
self.pipeline = experiment.pipeline # maybe we want to deepcopy it first?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should we deepcopy the pipeline first before we modify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already may have asked this question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be deepcopied, although it won't make a difference inpractice because with each run we run the function aswell.

run_name: str # Get's appended as a prefix before the pipeline name
train: bool # Weather the run should do training
dataset: pd.DataFrame
pipeline: "Pipeline"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to do forward declaration here, otherwise we're in for some dependency cycle fun!

run_name: str # Get's appended as a prefix before the pipeline name
train: bool # Weather the run should do training
dataset: pd.DataFrame
pipeline: "Pipeline"
metrics: Evaluators
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics looks relatively useful here, as it may change based on the experiment?

run_name: str # Get's appended as a prefix before the pipeline name
train: bool # Weather the run should do training
dataset: pd.DataFrame
pipeline: "Pipeline"
metrics: Evaluators
preprocessing_config: PreprocessConfig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so actually there could be multiple preprocessing_configs, if we merge the data. so this is in theory incorrect, in practice, it's probably fine to pass in the one we want to log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in case there are multiple initial data sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, multiple ones that we merge together.
But later, there'll be multiple datasources, I completely forgot about that as well!

@almostintuitive
Copy link
Contributor Author

The downside of this is that if I only want to change the pipeline we're running, but keep the rest intact, I'll need to do it in a loop or array comprehension to create the different Experiments.
But, I think that's fine. wdyt?

@almostintuitive almostintuitive marked this pull request as ready for review July 30, 2022 10:40
Copy link
Contributor

@szemyd szemyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good

@almostintuitive almostintuitive merged commit 76e04df into main Jul 31, 2022
@almostintuitive almostintuitive deleted the feature/experiments branch July 31, 2022 17:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants